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Studies of Admjgglong Testing and Handicapped People 



Most admissions testing programs have long made 
accommodations for handicapped examinees, though practices 
have varied across programs and limited research has been 
undertaken to evaluate such test modifications. Regulations 
under Section 504 of the Rehabilitation Act of 1973 impose 
naw requirements on institutional users, and indirectly on 
admissions test sponsors and developers, in order to protect 
the rights of handicapped persons. The Regulations have not 
been strictly enforced since many have argued that they 
conflict with present technical capabilities of test 
developers. In 1982, a Panel appointed by the National 
Research Council released a detailed report and 
recommendations calling for research on the validity and 
comparability of scores for handicapped persons. 

Due to a shared concern for these issues, College Board, 
Educational Testing Service, and Graduate Record Examinations 
Board initiated a series of studies in June 1983, The 
primary objectives are: 



To develop *n improved base of information 
concerning the testing of handicapped 
populations. 

To evaluate and improve wherever possible the 
accuracy of assessment for handicapped 
persons, especially test scaling and 
predicive validity. 

To evaluate and enhance wherever possible the 
fairness and comparability of tests for 
handicapped and nonhandicapped examinees* 



This is one of a series of reports on the project, which 
will continue through i 986 • Opinions expressed are those of 
the authors. 
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Abst rac t 

This study examined the psychometric characteristics of the 
• Scholastic Aptitude Test (SAT) administered under special 

conditions for nine handicapped groups. Information about 
test characteristics is central to judging the accuracy and 
fairness of scores from SAT special administrations. 

Four psychome trie character! sties were studied: level 
of test performance, test reliability, speededness > and 
extent of unexpected differential item performance. 
Psychometric comparisons were made between a nonhandicapp* d 
sample and each of nine different handicapped 
classifications. These contrasts were carried out twice; 
that is, they were replicated across two forms of the same 
test. The use of two samples taking different forms served 
to increase confidence in the stability of results and their 
applicability to other forms of the SAT. 

Results of the study showed that visually impaired 
students and those with physical handicaps achieved mean 
scores generally comparable to students taking the SAT in 
national administrations. Learning disabled and hearing 
impaired students scored lower than their nondisabled peers. 
Di f f erences between Verbal and Mathematical performance were 
also comparable to those for the nondisabled reference group 
in all but the hearing impaired-regular type test and 
visually impaired-braille test samples • Hearing impaired- 
regular students scorad higher on Mathematical than on 
Verbal relative to their nondisabled peers, while visually 
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impaired-braille students showed no consistent superiority 
for Mathematical over Verbal. 

Analysis of test reliability revealed no practical 
differences in measurement precision acros. groups. Data on 
test speededness showed no evidence of disadvantage for 
disabled students; the amount of extended time allotted 
through special administrations appears to allow roughly 
equivalent proportions of handicapped and nondisabled 
examinees to complete the test. 

Because of the large number of groups and test items 
involved, unexpected differential item performance was 
examined through a two-stage procedure. The first stage 
centered on the performance of item clusters. Individual 
items composing clusters showing questionable performance 
were then examined. This two-stage procedure revealed only 
a few instances of differential item performance localized 
to visually impaired students taking the braille test. 

It is concluded that, with the exception of performance 
level, the psychometric characceristics of the SAT are 
generally compara ble for the handicapped and nondisabled 
groups studied. These results lend support to the 
contention that scores from special administrations are fair 
and accurate measures of the developed scholastic abilities 
of handicapped students. Further studies of these scores — 
in particular, their factor structure and predictive 
validity — should provide additional information about their 
meaning for handicapped students • 
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In 1983, the College Board, Educational Testing Service 
(ETS), and the Graduate Record Examinations (GRE) Board 
initiated a joint project, "Studies of Admissions Testing 
and Handicapped People , " in response to a call by a National 
Academy of Sciences Panel for further research into the use 
of college and graduate admissions tests for handicapped 
individuals (Sherman & Robinson, 1982). As part of that 
joint research effort, this study presents information on 
the psychomet ric characteris tics of the Scholastic Aptitude 
Test (SAT) for nine groups of handicapped examinees. The 
study reports data on the level of performance, test 
reliability, speededness, and extent of unexpected 
differential Item behavior for these groups. These data, in 
particular those on reliability and differential 
performance, are fundamental to evaluating the extent to 
which the SAT fairly and accurately measures the developed 
scholas tic abilities cf handicapped students • 

The Scholastic Aptitude Test 

The Scholastic Aptitude Test is developed and 
administered by ETS as part of the Admissions Testing 
Program of the College Board, an independent, nonprofit 
membership organization that provides tests and other 
educational services to students , schools , and colleges 
(College Board, 1983). The Board's membership is composed 
of more than 2500 colleges, schools, school systems, and 
educational association* . Along with other indicators, 
institutions use the SAT to select students for admission, 
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to monitor changes in the academic capabilities of their 
applicant and ent ering-f res hraen populations, and to recruit 
and place students. 

The SAT is a multiple-choice examination made up of 
Verbal and Mathematical sections. The Verbal section of the 
axara is composed of 85 items falling into four categories: 
analogies (20 questions), antonyms (25 questions), sentence 
completion (15 questions), and reading comprehension (25 
questions). Analogies items are meant to assess the 
examinee's ability to detect verbal relationships between 
pairs of words while antonyms are designed to measure 
breadth and depth of vocabulary (Dorans, 1982). Together, 
performance on these item types forms the SAT Vocabulary 
subscore • 

The Reading subscore of the SAT reflects perf ormance on 
sentence completion and reading comprehension items • 
Sentence completion tests a student's ability to recognize 
logical relationships among parts of a sentence. Reading 
comprehension questions assess a greater variety of 
abilities including recalling specific details, identifying 
the main idea, making inferences, analyzing arguments used 
by the author , detecting the author ' s tone or attitude, and 
making generalizations on the basis of presented information 
(Dorans, 1982). Examples of each SAT-Verbal item type are 
presented in Figure 1. 
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Insert Figure 1 about here 



The Mathematical section of the SAT contains 60 
questions divided among two formats : standard multiple 
choice (40 questions) and quantitative comparison (20 
questions). Quantitative comparisons emphasize the concepts 
of equality, inequality , and estimation , and generally 
involve less reading, take less time to answer, and require 
less computation than standard multiple choice questions 
(College Board, 1983). The quantitative comparison 
typically presents two quantities. The test candidate must 
examine the quantities and select from four options the one 
that best describes the relationship between the two 
amounts. Examples of the quantitative-comparison and 
standard multiple-choice item types are presented in 
Figure 2. 



The content of items in the SAT Mathematical section is 
divided almost equally among arithmetic , algebra , geometry , 
and miscellaneous questions designed to measure abilities 
related to college- level work in the l J \>eral ar ts , sciences , 
engineering and other fields requiring mathematics. 
Mi sco liar. eous questions test logical reasoning, number 
theory, number systems, or other content that does not 
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readily fit into any of the three basic categories listed 
above . 

When administered, the SAT is divided into five 
separately timed, 30-minute sections: two verbal, two 
mathematical , and one experimental section that dees not 
count toward the student 9 s score . The sections are bound 
together in a test booklet that also contains a 50 question 
Test of Standard Written English signed to assist colleges 
in placing students in freshman English courses. Items of a 
similar format are typically grouped together within 
sections, though more than one item format can appear in 
each section and the same item type can appear in more tha n 
one section. 

National admiaist rations of the SAT are offered seven 
times a year. The composition of student groups taking the 
test at different times of the year varies widely with high 
school seniors constituting the bulk of examinees during the 
fall administrations and juniors counting for the larger 
group during the spring exam period* Differences in average 
ability are also apparent across administrations, wich the 
more able .groups taking the exam during the early fall 
(seniors) and late spring adminis tra tions (juniors). 

Special administrations for handicapped students have 
been offered since 1938, when braille and large-type 
versions of the test were administered to visually-impaired 
examinees (Saretsky, 1983). Since that time, special 
accommodations have been extended to students with physical, 
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hearing, and learning disabilities and extra time and rest 
period?; cassette, braille and large-type presentations; the 
use of a reader or scribe; and various combinations of these 
arrangements have been offered. 

Results of SAT administrations are reported ior Verbal 
ar.d Mathematical pe rf prmance , each on a 200 to 800 standard- 
score scale with a mean of 500 and standard deviation of 
100. The scale is based on the performance of college 
applicants taking the test in 1941 (Donlon, 1984); the 
performance of all subsequent groups is statistically 
equated to that original admin is t rat ion. Hence , the means 
and standard deviations of groups taking the test have 
deviated over the years from their original values, but the 
meaning of scores has stayed the same. Subscores for 
Vocabulary and Reading are reported on a 20 to 80 scale. 
Scores are accompanied by the designation, "NON STD," 
whenever the test was not administered under s tandard 
conditions and £TS cannot assume comparability of the scores 
to those achieved under typical circumstances. 

The psychometric characteristics of the SAT have been 
widely studied in the general population and in some special 
populations (e.g., black examlne.es) , but not wi th 
handicapped studen*.s (Bennett , Ragost^, & Strieker, 1984). 
Median correlation coefficients with college grades based 
upon 827 predictive validity studies were repo/ .ed to be .41 
for the total test, .37 for Verbal, and .32 for Mathematical 
(Educational Testing Service, 1980). Median coefficients 
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for high school grade point average (HSGPA) and for the SAT 
and HSGPA combined were .52 and .58, respectively. As with 
all averages, these median coefficients mask variation. The 
predictive validity of the SAT varies as c- function of 
institutional characteristics (selection rules, grading 
standards, educational program), academic year, student 
population, and other factors. In some cases, these factors 
cause the SAT's predictive validity to approach zero, while 
in many others it is much higher than that attributed to 
high school grade point aver ge (Breland, 1978). 

Subj ec t s 

During the period from Fall 1978 through July 1983, the 
Admission Testing Program's Services for Handicapped 
Students offered two forms of the SAT, designated as WSA3 
and WSA5 , to handicapped students requesting special 
admini: :rations. Because retention of student data from 
special administrations began in 1980, the only data 
available fur analysis are from March of that year through 
June 1<*83, the time that two new forms were put into special 
service • 

During the March 1980 to June 1983 time period, 16,961 
students were given special administrations of the SAT. Of 
these students, 5,213 and 4,236 are known to have taken WS A3 
and 5, respectively. Which of the two forms was taken by 
the remaining students is unknown. During this period, 
other handicapped students undoubtedly took standard 
administrations of the SAT on national test dates. Because 
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it is not necessary to reveal the presence of a disability 
unless a special administration is requested , the number of 
handicapped studeits taking standard administrations is 
unknown • 

In this study , data from both WSA3 and 5 are used. By 
using these two data sets, attention can be focused on those 
findings .that replicate across forms. Because of their co- 
occurrence, such findings are less likely to be artifacts 
associated with a single form or particular sample 01 
subjects. They are more probably stable results that will 
manifest themselves in other samples from the same 
disability group and on other forms of the SAT. 

Students requesting special administrations of the SAT 
during the s tudy period fell into five major disability 
groups: visually impaired (VI) , physically handicapped 
(PH), hearing impaired (HI), learning disabled (LD), and 
multiple handicapped. Types of special administrations 
offered included braille, large type, cassette, regular 
type, cassette and large type, braille and cassette, and 
cassette and regular type. All special administrations 
included the option of extended time. Tables la and lb show 
the number of students with each disability taking each t.ype 
of special administration of WS A3 and WSA5 . 



Insert Tables la and lb about here 
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As the tables show, the largest number of special 
administrations (3552 for WS A3 and 2883 for WSA5) were taken 
by learning disabled students and the most frequently used 
format was regular type (3889 for WS A3 and 2924 for WSA5 ) . 
Visually impaired students represented the second largest 
disability group (893 for WSA3 and 858 for WSA5) and large 
type the second mos t-f requently used format (726 and 676) • 
Of the 35 possible test-format-by-disability-group 
combinations, the two largest were* LD students taking 
regular-type (2983 for WSA3 and 2316 for WSA5) and visually 
impaired students taking large-type administrations (486 and 
498) • 

In addition to these two groups, seven other format-by- 
group combinations have numbers of students (roughly 100 or 
more on each form) sufficient to support dependable results 
and justify further study. These groups are, for regular 
type, visually impaired, hearing impaired, and physically 
handicapped students; for large type, learning disabled 
pupils; for braille, visually impaired examinees; and for 
cassette and cassette and regular type, learning disabled 
pupils. Table 2 lists the sample sizes and acronyms used to 
denote these nine groups. 



Insert Table 2 about here 



To properly evaluate the psychoraet r ic characteristics 
of the SAT for these nine disability groups some reference, 
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or "standard," population is needed. Without such a 
po pul a tion , the typical behavior of the test cannot be known 
and any departures from this behavior by subpopulat ions 
cannot be detected. In the present study, several standard 
group 8 are used. Among these groups are the 5.1 million 
high school students taking all forms of the SAT offered 
during the March 1980 to June 1983 time period. Most 
comparisons, however, are based on a standard group of high 
schoo 1 students who took forms WSA3 and WSA5 under typical 
testing conditions. WSA3 was administered to 35,424 high 
school seniors in Texas and California during October 1974; 
WSA5 was given nationally to 33,161 high school juniors in 
December of that same year. 

Table 3 lists the mean Verbal and Mathematical scores 
for high school pupils taking WSA3 and WSA5 , and for high 
school students taking other forms of the SAT during the 
March 1980 to June 1983 period. As the table shows, the 
high school seniors taking WSA3 perform better than their 
counterparts taking the SAT during the 1980 to 1983 period 
on both Verbal and Mathematical, suggesting that the WSA3 
group is somewhat more select than the group of seniors 
typically taking the SAT. On the other hand, the juniors 
taking WSA5 seem to perform substantially worse than their 
counterparts taking the test during the 1980-83 period. 
Hence, students taking the WSA forms may not be broadly 
representative of those taking the SAT during the 1980-83 
period. Still the nonhandicapped group taking the same form 
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under standard conditions, though not ideal, should prove 
workable where needed as a reference for the nine disability 
groups used in the study. 



Insert Table 3 about he^e 



Results 

Results are reported for level of performance, test 
reliability, speededness, and extent of unexpected 
differential item performance. 
Level of Performance 

Table 4 lists scaled score means and standard 
deviations for the performance of the nine handicapped 
groups on the Verbal and Mathematical sections of WS A3 and 
WSA5 • Summary statistics for all pupils sitting for the 
exam during the March 1S80 to June 1983 period (designated 
NHA) are also included. These students have taken test 
forms other than WSA3 and 5« However, because Verbal and 
Mathematical scores on the SAT are equated across forms, the 
scores of this reference population are expressed on the 
same scale as those of the nonhandicapped students taking 
WSA3 and 5. 



Insert Table 4 about here 



To facilitate comparison with students typically taking 
the SAT, Table 5 presents the difference between the 
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handi capped and nondisabled student means in standard* 
deviation units of the nondisabled group. Review of Table 5 
suggests some consistency in the performance of disability 
groups across SAT forms . On the Verbal section, the mean 
performance of the three visually impaired groups (VIB, VIR, 
VIL) and of the physically handicapped group (PHR) is 
generally better than or just below the nonhandi capped 
reference group (NHA). The LD ( LDR , LDCR, LDC , LDL) and 
hearing impaired (HIR) groups have substantially lower mean 
scores than the reference group, usually by at least a half 
standard deviation. This general pattern appears to hold 
for the Mathematical section also, with the possible 
exception of visually impaired students taking the braille 
format (VIB), These students score relatively close to the 
nondisabled mean on one form and dramaticaxly below it on 
the other. 



Insert Table 5 about here 



In addition to nean performance, the degree of 
variability evidenced for some groups is also noteworthy 
(see Table 4). On the Verbal section, restrictions in the 
range of scores are found on both forms for the LD groups 
taking cassette (LDC) and cassette and regular tests (LDCR), 
while an unusually wide range with respect to the reference 
group is noted for visually impaired-braille (VIB) students. 
For Mathematical, LD students taking cassette and regular 
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editions (LDCR) show a restricted range on both forms* 
Consistently widened ranges are found for two visually 
impaired groups, those taking the regular edition (VIR) and 
students using the large-type version (VIL) • 

Aside from mean performance and degree of variability , 
differences in intra- test scores are of interest. Table 6 
shows the extent to which each group scored better on Verbal 
or Mathematical relative to all students taking the SAT 
during 1980-83. The tabled indices represent the ratio of 
the difference between the Verbal and Mathematical scores 
for a handicapped group divided by the pooled standard 
deviation for that group to the same quantity calculated for 
nonhandicapped students. Positive values indicate a 
difference in the same direction as for the reference group 
(i.e., Mathematical greater than Verbal), while negative 
values denote the converse. The magnitude of the index 
shows the extent to which the standardized difference is as 
large as the comparable value for the reference group. A 
value of 1.00 indicates intra-test performance equivalent in 
magnitude and direction to the reference group. 

From the table it can be seen that hearing impaired- 
regular type students (HIR) show a consistent performance 
difference in favor of Mathematical about twice as large as 
for the nonhandicapped reference group. This performance 
dif f erence is consonant wi th the documented English language 
deficiencies of this group (e.g., Meadow, 1980). Visually 
impaired-braille pupils (VIB) also show consistently 
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different intra-test performance. Unlike the reference 
group, these students do not evidence uniformly superior 
performance on Mathematical relative to Verbal. One 
possible explanation for this finding is that visually 
impaired-braille students are encountering unusual 
difficulty with geometry and other math items involving the 
understanding of figures , tables, or special symbols . 



Insert Table 6 about here 



A final point of interest relates to differences 
between each group Y s performance on the two SAT forms (see 
Table 7). Examination of Table 7 shows that the scores of 
some groups differ substantially across forms. Because 
scores from different forms are equated, variations in 
performance generally suggest real differences in the 
abilities of the groups taking one form or another. An 
alternative explanation is that the equating procedure, 
which is based on the performance of nonhandi capped students 
taking standard administrations , operates differently when 
applied to the scores of disabled pupils taking nonstandard 
examinations. This latter possibility is not very likely, 
however, since all the handicapped distributions show 
considerable overlap with the s tandard population. 



Insert Table 7 about here 
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Rellablllty 

Reliability refers to the precision or accuracy with 
which a test measures. Differences in the precision of 
measurement across groups can negatively impact upon the 
less accurately- measured group. For example , consider what 
might happen if an admissions test measured less precisely 
for deaf than for hearing students. In this situation, the 
dispersion of the observed scores of deaf students around 
their true scores (i.e., those scores indicative of their 
actual abilities), would generally be greater than they 
would for hearing pupils. The admissions officer's decision 
to admit or place a deaf student vrould, therefore, be 
subject to a greater likelihood of error than for 
nonhandicapped applicants • 

The two indices most often used to assess test 
reliability are the reliability coefficient and the standard 
error of measurement ( SEM) . By definition, the reliability 
coefficient is affected by the amount of test score 
dispersion in a group, with smaller variances tending to 
produce smaller reliabili ty coefficients. Because of this 
sensitivity to within-group homogeneity, f .he reliability 
coefficient is limited as a comparative measure of precision 
across groups. (It retains utility, however, as an index of 
the test's ability to separate individuals within a given 
group • ) The standard error of measurement is relatively 
unaffected by score variance. It, therefore, is better 
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suited to the comparison of measurement accuracy across 
popula t ions • 

Table 8 presents alpha reliability coefficients and 
standard errors of measurement for handicapped and 
nonhandi capped s tudents taking WSA3 and 5 • For the 
nonhandicapped students (denoted as NHF )> reliability 
coefficients for the Verbal section are both .92. 
Coefficients for the disability groups fall within a few 
points of these values , with the except ion of the learning 
disabili ty-casset te (LDC) and LD-cassette/ regular (LDCR) 
groups, for which the coefficients run between .84-. 86. As 
previously noted, these groups are also among the most 
restricted in score range. 



Ir. ert Table 8 about here 



Standard errors of measurement are presented in raw 
score units. For the high school students taking the 85- 
item Verbal sections of WSA3 and 5, the raw-score SEMs are 
3.73 and 3.75, respectively. Without exception, .the SEMs 
for all handicapped groups arc virtually identical to these 
values, differing by only a few hundredths of an item. 

Re liability coefficients for nonhandi capped students 
taking the Mathematical sec t ion range f rom .91-. 92. Again, 
coefficients for the handicapped groups hover closely about 
these value 8 , though in this case no group consistently 
deviates from the nonhandicapped figures. Likewise, the 
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SEMs for the handicapped samples are virtual ly 
indistinguishable from those for the nondisabled group. 

The alphr* coefficients and SEMs reported above 
incorporate one primary source of measurement error: that 
due to differences in the samples of items used to assess 
scholastic ability. A second major source — error due to 
differences in the occasions on which ability was assessed — 
is not included. However, coefficients incorporating both 
major error sources have been reported for the SAT wich 
similar results for several disability groups ( Bennett , 
Ragosta, & Strieker, 1934), suggesting that consideration of. 
the additional error source does not greatly change the 
comparability of measurement precision across populations. 
Test Spfcededness 

Special administrations of the SAT are commonly given 
with allowance for extra time and rest periods. However, 
the amount of extra time afforded may not be enough for the 
same proportion of disabled students to complete the test as 
their nondisabled peers, thereby introducing an unfair 
disadvantage into the testing process. . 

To check the extent ro which the test is speeded for 
students taking special administrations , two indices, the 
percent of etudents completing the section and the percent 
finishing 75% of the section, were computed and compared to 
those for high school students taking the WSA forms in 
standard administrations. Because neither index is a fully 
satisfactory measure of speededness, they are jointly 
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considered in tie evaluation of test timing* (In isolation, 
the* index based on the perce^c of students completing the 
section can be particularly misleading because it does not 
distinguish between students intentionally omitting the last 
item and those not reaching it* Hence, a closing item that 
is particularly hard for one group may cause this index to 
give a spurious indication of speededness* ) 

Table 9 presents the ratio of each disability group 
index to its reference-group counterpart* Values of 1*00 
indicate equal percentages completing the section or part 
section for both groups, while those above 1,00 suggest 
greater completion rates for disabled students. When both 
speededness indices and both forms are simultaneously 
considered, it is clear that, with respect to the reference 
samples, no disability group is consistently disadvantaged 
by lack of time* On the contrary, several groups, such as 
hearing impaired-regular type students (HIR) and visually 
impaired-braille pupils (VIB), may receive more time than 
necessary on selected SAT sect ions • 



Insert Table 9 about here 



Unexpected Differential Performance 

The concept of unexpected differ en tial performance is 
derived from the notion that items on a unidimensional test 
should measure the same construct for different groups of 
examinees (Shepard, 1982) * Items found to measure different 
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constructs across groups are biased in che sense that, for 

some groups, they may be assessing factors irrelevant to the 

purpose of assessment. If found in any number, such items 

may unfairly lower (or raise) the test scores of a group. 

In many cases, however, biased items are found, in the 

aggregate, to affect different groups equally, cancelling 

any overall advantage or disadvantage that might otherwise 

be afforded (Berk, 1982; Shepard, 1982). Still, the 

identification of such items is important, for it alerts 

test developers to the kinds of questions that should be 

removed from future test revisions, or at least not 

disproportionately added letft the balance of questions 

favoring and disfavoring groups be destroyed. 

Most taethods of detecting items that operate . 

differently across groups consider an item to be deviant if 

groups of equal ability perform differently on it. This 

definition of item bias makes sense only if it can be 

assumed that the test or subtest under investigation is 

basically unidimensional (Shepard, 1982). If the measure 

can be safely considered to be unidimensional, then 

differences in performance on an item that remain after 

standing on the dimension has been accounted for must be due 

* 

to irrelevant sources. 

A second common characteristic of item bias methods is 
that total test score is used as a proxy for ability level 
(Shepard, 1982). If all items in the test measure the same 
irrelevant construct for one group, it is possible that no 
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indication of bias will appear; no item will stand out 
because ic measures something different from the others. 
Item bias methods cannot, therefore, detect pervasive bias 
in a tes t because they lack an external v :it erion • As such , 
the study of i£em bias can be only one part of a 
comprehensive investigation of a test's fairness. At a more 
macroscopic level, a comprehensive investigation should also 
consider the test's factor structure, to see if the test 
actually measures the same general construct across groups, 
and its relationship with external variables, to ensure that 
relevant criteria are predicted with equal accuracy. 

To detect the possible presence of SAT item types that 
operate differently across groups, a two-stage method was 
used. First, items were organized into logical clusters*. 
Cluster structures were based on those characteristics that 
might prove unusually troublesome for particular groups of 
handicapped examinees and on groupings typically used in the 
SAT development process. The performance of these clusters 
was then investigated* Second, items belonging to 
de viantly-operating clusters were studied to determine if 
the cluster itself defined a potentially biased item type 
or, alternatively, if only a few aberrant items accounted 
for the unusual cluster performance, 

This two-stage approach is somewhat different from the 
methodology traditionally used in item bias research. In 
the traditional approach, all items are individually 
assessed (e.g., see Kulick 1984). Individually assessing 
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all items , however, hns sign? f leant practical disadvantages 
in studies when several groups and forms are involved* 
First, this r thod necessitates the analys is of a large 
number of item performances. T n the present study, nine 
disability groups and 145 items per SAT form would generate 
2610 performances. Second, even in groups in which bias is 
known not to exist (e .g. , two random samples drawn from the 
same population), statistical techniques will identify by 
chance some small proportion of items as biased (Sinnott, 
1980). Assuming, for example, a significance level of .05, 
2610 contrasts would produce 131 items flagged by chance 
alone. These items would, of course, be mixed in with other 
correctly identified questions. Separating the two groups 
through content analysis would take substantial effort, and 
in some cases be unsuccessful as the underlying causes for 
differential operation are frequently unclear (Scheuneman, 
1982) . 

Item clusters , The rationale behind the study of item 
clusters is generally the same as that used for items: on a 
test measuring a single construct, a cluster should be of 
equal difficulty for different groups of examinees of the 
same ability. If not, the cluster is measuring different 
abilities in the groups. 

To examine the performance of clusters, Verbal section 
items were divided by type into the four formats used in the 
test: antonyms (25 items), analogies (20), sentence 
completion (15), and reading comprehension (25). These item 
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types were assumed Co measure a single verbal ability 
factor, an assumption supported by "he results of factor 
analysis (Rock, Bennett , & Kaplan, in press). The 
regression of each item- type cluster score on the total 
Verbal scot? for nonhandicapped students taking the forms 
(i.e., the standardization group) was then computed. This 
regression provided a prediction of performance for the 
standardization group on each item-type cluster for each 
Verbal score level • Using the Verbal mean for each 
disability group in turn, the predicted cluster scores for a 
nonhandicapped group of the same total ability was 
calculated. The predicted cluster mean for the 
nonhandicapped group was then subtracted from the 
handicapped jroup's actual cluster mean, yielding a positive 
residual if the disabled students did better than the 
reference group and a negative one when performance was 
worse than predicted. Finally, this residual was divided by 
the cluster standard deviation for the disability group. A 
meaningful departure from the expected difficulty of the 
cluster was said to exist for a group if the standardized 
residual exceeded an absolute value of .2 standard 
deviations on both SAT forms. This .2 standard deviation 
criterion has been previously suggested as a ninimum for 
identifying the presence of meaningful effects in the social 
sciences ( Cohen , 1969). 

Previous research and clinical findings raise the 
possibility that some disability groups experience unusual 
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difficulty on selected Verbal item types. For example , some 
studies have found items associated with lengthy passages to 
be differentially difficult for deaf students (Rudner, 1978; 
Trybus & Buchanan, 1973, in Rudner, 1978). As SAT reading 
comprehens ion items are of this type , unusually poor 
performance on this subtest — that is, with respect to 
no nhandi capped students achieving the same total score — 
might be expected • Vocabulary items also are reported to be 
difficult for these students (Ragosta & Kaplan, in press), 
as well as for those with learning disabilities. Learning 
disabled pupils are said co have particular difficulty with 
antonyms and with the logical relationships required by 
verbal analogies (Wiig & Semel, 1973, 1974, W75, in Wiig & 
Semel, 1976). Finally, analogies have been found to be 
differentially difficult for other special populations, in 
particular black examinees (Dorans, 1982; Kulick, 1984). 

Table 10 presents standardized residuals for each 
disability group's performance on the four Verbal item 
types. As can be seen, most values are below .1 standard 
deviations in magnitude and no value exceeds .2 standard 
deviations on both forms. The pair of values that comes 
closest to the .2 criterion is for hearing impaired s tudents 
on Sentence Completion, an item type that might prove 
somewhat differentially difficult because of the syntactic 
complexity of the construct-fens occasionally used. 
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Insert Table 10 about here 



For the Ma t he mat i cal section, standardized residual s 
for the item clusters were calculated in a way similar to 
that for Verbal with two exceptions. First, the cluster 
scores of nonhandicapped examinees were regressed on total 
Ma thematical (instead of total Verbal) score to obtain a 
predi ction of expected clus ter performance for the 
disability groups. Second, several overlapping cluster 
structures were tested based on the presence of graphical 
material, content, and reading load. More than one 
structure was tested because Mathematical items appear, at 
least on the surface, to require a broader constellation of 
basic skills for solution, thereby allowing more room for 
bias. For example, in addition to reasoning ability, some 
Mathematical items require the visual skills needed to read 
graphs, tables, or special symbols, or to manipulate figures 
in space; for visually impaired examinees, these items may 
be more a measure of visual-spatial than math reasoning 
skills. Other items, such as word problems, entail reading. 
The functioning of these items should be considered suspect 
for ooor reader* and for pupils with limited language 
skills, such as learning disabled and dee f examinees. 

For the first analysis, four clusters based on the 
presence or absence of graphics were used. To form 
clusters, items were first split into standard multiple 
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choice and quantitative comparison item types* Each of 
these groups was divided into graphics (i*e*, items 
including tables or figures) and nongraphics, or text, items 
to form the following clusters: text multiple choice, text 
comparisons, graphics multiple choice, and graphics 
comparisons. Items were considered to involve graphics only 
if a graphic was actually presented* 

Standardized residuals for these clusters are presented 
in Table 11* While, most of the standardized residuals fall 
between -.1 ard .1 standard deviations, striking difficulty 
effects for visually impaired-braille students on the 
graphics multiple choice cluster are apparent* The results 
for these students on graphics comparisons are less 
consistent, with WSA5 showing a large differential 
difficulty effect and WSA3 evidencing an effect just below 
the *2 criterion. Investigation of the items suggests that 
the type of graphics used for this cluster on WS A3 are less 
complex and diverse than those used on WSA5 • 



Insert Table 1 1 about here 



The second cluster s true ture investigated involved 
eight item groups primarily based on test content* Again, 
items were split into multiple choice and quantitative 
comparisons* These divisions were then separated into 
arithmetic, algebra, geometry, and miscellaneous sets* 
Because the resulting miscellaneous comparisons set included 
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only 2 items on one form and one item on the other, this 
cluster was dropped from the analysis. 

S tandardized residuals for the seven content clusters 
are presented in Table 12. As inspection of the table bears 
out, the residuals for this cluster structure generally 
appear larger than those for the previous one • S till the 
•2 criterion on both forms is exceeded only three times. 
Amo* j those exceeding the criterion, the algebra comparisons 
cluster appears unexpectedly easy for learning disabled- 
cassette (LDC) and for hearing impaired-regular (HIR) 
examinees. Similar, though insignificant, effects are also 
found for the other groups on this cluster. One factor that 
may be contributing to this finding is that the cluster is 
disproportionately loaded with late-appearing items, items 
that those taking extended- ti^e administrations would be 
more likely to reach. Of the six items on WSA3 , two are at 
the end of the 35 question section (//32 and 34), while two 
of five items are at the close of WSA5 (#32 and 35). 



Insert Table 12 about here 



In addition to the significant effect for the algebra 
comparisons cluster , miscellaneous multiple choice items 
were found to be unexpectedly difficult for visually 
impaired-braille students (VIB). Analysis of item content 
for this cluster suggests that it is composed of a 
collection of items that may prove unexpectedly difficult 
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for different reasons. Among the potential sources of 
differential difficulty are items that utilize novel 
symbols, assess concepts (e.g., probability) often taught 
using graphics (e.g., Venn diagrams), assume clear 
translation to braille of and facility with visually-based 
symbol systems (e.g., the tally system), or require skill in 
mentally manipulating figures in space. 

The f ina? Mathematical cluster structure examined 
involved three groupings based on reading load: nonreading, 
minimal reading , and reading . I terns were placed in the 
reading category if they contained more than one line of 
text in the stem or response options. Minimal reading items 
were those with approximately one line of text or less, 
while nonreading items contained no words, only mathematical 
symbols and numerals. Written directions at the beginning 
of each of the two Mathematical sections were not included 
in the analysis as the amount of reading entailed was 
constant for all items. 

Table 13 presents standardized residuals for the 
reading load clusters. No consistent effects are found, 
except for hearing impaired-regular students (HIR) who find 
the nonreading cluster unexpectedly easy. Again, a 
contributing factor mav be the disproportionate loading of 
this cluster with late-appearing items. This explanation is 
consistent with the effect sizes: in the WSA3 cluster, 
three of eight items are at the end of the test section and 
an effect of .36 standard deviation units is found, while 
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the WSA5 cluster has fewer late-appearing items (three of 
13) and a much smaller effect (.2 standard deviations). 



Insert Table 13 about here 



A second possible explanation for this effect is that 
hearing impaired s tudents perform better on this cluster 
because it is comparatively free of language. This 
explanation would be supported by the discovery of 
difficulty effects for this group on the reading cluster, 
which contains a fair amount of language? Since such 
effects are not uniformly apparent, the explanation may not 
be wholely satisfactory. 

With the possible exception of deaf students, then 
significant difficulty effects for reading load are not 
evident. This finding is encouraging, especially for 
learning disabled examinees who generally possess reading 
and language deficits. For such groups, these results imply 
that, with extended-time and other relevant special 
modifications (e.g., cassette presentation), the reading 
load associated with Mathematical items is light enough to 
avoid interfering with measurement of the underlying 
mathematical reasoning ability presumedly tapped by the 
test. 

In sum, the analysis of item clusters has identified 
five consistent effects of a magnitude large enough to 
warrant closer study. All identified effects are associated 
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with the Mathematical section of the SAT. The negative 
ef f ec ts--that is, those indicating unexpected difficulty — 
are concentrated among vi3ually impaired-braille students 
and evidence themselves on the graphics multiple choice and 
miscellaneous multiple choice item clusters. For the former 
cluster, the effect was hypothesized as being due to the 
presence of complex graphics , tables , and figures which 
measured basic visual-spatial skills in addition to 
mathematical reasoning ability. For the latter cluster, a 
conglomerat ion of factors , including unfamiliar symbols and 
operations requiring visual-spatial skills, were posed as 
sources of dif f erent ial performance. 

Positive effects — denoting that the associated clusters 
were unexpectedly easy— were found for the hearing impaired- 
regular and learning disabled-cassette groups on algebra 
comparisons, and for the hearing impaired group on the 
nonreading cluster. In all three cases, the effects were 
suggested to be the result of a methodological artifact: 
the disproportionate presence of late-appearing items in a 
cluster. 

Individual items . The identification of item clusters 
can be considered the first, or screening, stage in a two- 
tiered procedure for detecting broad item classes that ' 
appear to operate differently for handicapped and 
nondisabled populations • Ther ef ore , after screening the 
Verbal and Mathematical item clusters and identifying 
groupings that appeared to operate differently, a second, 
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raore focused methodology was applied. This methodology was 
designed to detect individual items that seemed to 
contribute significantly to the finding of differential 
difficulty for the cluster. In so doing, the methodology 
indicated the extent to which cluster effects were due to a 
few isolated it-ems or, alternatively, to the preponderance 
of items composing the type. In addition, the methodology 
was meant to provide rigorous statistical tests of the 
implicit assumption that tiie relationship between total 
score and the probability of passing a given item is the 
same in the standard and handicapped populations. 

To accomplish these goals, logistic regression was used 
to analyze performance on those items composing the clusters 
identified as differentially difficult or easy for a 
handicapped group. Within each identified cluster, the item 
performance of the handicapped group was cont ras ted with the 
s tandardi 2 at ion population (i.e., nonhandi capped student & 
taking WS A3 or WSA5) to determine if the expectations of 
passing a given item ( condit ' oned on total test score) were 
equivalent across groups. In addition, logistic regression 
was used to compare the equality of the slopes of an item 
performance on total test -score for handicapped and 
nondisabled groups. This latter comparison indicated the 
extent to which an item evidenced differential operation as 
a function of ability level (e.g., no differential operation 
for low-scoring handicapped examinees but differential 
difficulty for high-scoring ones). 
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More formally , in the standardization population, £, 
the probability, P, of passing item i^ given a total score X 
- X' is: 

P » P(x - l T X - X 1 ) 
ci 

where x is a 0 or 1 score obtained on item ^. Similarly, in 
any given handicapped group, h 9 : 

P * P(x - 1 I X - X 1 ) 
hi i 

The question to be answered is whether: 

P - P / 0? 
ci hi 

The logistic regression model first estimates the unknown 
regression parameters in the following equation: 

log (P / (1 - P)) * B + B D + B X (1) 

0 1 2 

wliere P is the probability of passing a given item, D is a 
dummy variable indicating whether an individual is in the 
standardization or handicapped groups , and X is the total 
test score. 

Given maximum likelihood estimates of the unknown 
regression parameters B(0), B(l), and B(2), the expected 
probability of a standardization group student passing item 
i is: 

A , -B 

P - 1 / 1 + e 0 

ci 
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and the expected probability of a handicapped group student 
passing item 1 is: 

a ~(B + B ) 

P - 1 / I + e 0 1 
hi 

Tests of the equivalence of slopes are carried out by adding 
a cross-product term to equation (1). 

Table 14 presents results for the visually impaired- 
braille (VIB) and standardization (NHF) groups on items 
belonging to the graphics multiple choice cluster. For each 
item in the cluster, the probabilities of passing for each 
group, the difference in those probabilities, and the 
presence of an interaction effect are listed. Differential 
operation was said to exist when the logistic regression 
coefficient (Bl) was significantly different from zero or 
when tests of the equivalence of slopes indicated 
significant differences. Whether the item was unexpectedly 
easy or hard is indicated by the presence or absence of a 
negative <Hgn in the difference column; a negative 
difference in the probability of passing an item indicates 
that the item was unexpectedly difficult for the VIB group. 

Because of the large sample sizes in the standardi- 
zation populations, relatively trivial differences in 
probabilities are often significant. It is, therefore, 
suggested that differences in probabilities be at least .1 
before a statistically significant result is considered 
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practically meaningful. Unfortunately, for interaction 
effects, no criterion for practical importance can be easily 
derived . Therefore , when significant, these effects may 
indicate only the most minimal deviations and , hence, should 
be interpreted with caution. 



Insert Table 14 about here 



As the table indicates, six of the ten items in the 
WSA3 cluster show statistically significant effects: four 
main ef r *cts and two interactions. Of the items showing 
main effects, one far exceeds '.he .1 difference criterion 
and two approach it. In the WSA5 cluster, two of eight main 
effects are significant, are two interactions. The two 
items with main effects approach, but do not reach, the .1 
difference level. 

The item evidencing the greatest difference in the 
probabilities of passing on the two forms (116) requires the 
examinee to choose from among five options the size of one 
of several angles resulting frora the intersections of a 
series of lines, given information about the relationship 
between the lines and the sizes of related angles (see 
Figure 3a). While the text of thir item contains several 
special symbols ( two t« .<*oel lines and one denoting the 
parallel relationship of the lines), definitions for all 
symbols are provided either in the test directions or in tae 
accompanying figure. Further, other items which use similar 
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notations do not evidence difficulty effects. Hence, the 
specific content responsible for the observed effect is not 
imuediately clear. This failure to identify the likely 
cause of differential operation is, unfortunately, a common 
occurrence in item performance studies (Scheuneman, 1982). 



Insert Figure 3 about here 



Of the items approaching the .1 criterion, one requires 
the mental rotation of two graduated dials, one embedded 
within the other; two involve determining the area of a 
figure; and one computing the length of a line given 
intermediate distances. The graduated cylinder item, in 
particular, may require cognitive-spatial skills that are 
less well-developer in visually inpaired examinees. 

Table 15 presents results for the performance of 
visually impaired-braille (VIB) and nonhandicapped students 
( NHF) on the miscellaneous multiple choice cluster. On 
WSA3, five of six items show statistically significant main 
effects, two of which also show interaction effects. Of the 
five significant items, one far exceeds the practical 
criterion and one approaches it. For WSA5 , three of six 
differences are significant, with only one item achieving 
the criterion for practical importance. (One of these three 
items [13] was included above as significant in the graphics 
multiple choice cluster.) 
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Inser t Table 15 about here 



The item showing the largest difficulty effect across 
both forms (16) is presented in Figure 3b. This item asks 
the examinee about the tally system. In this system, the 
number five, is represented by four vertical lines crossed by 
a diagonal, seven is denoted by the symbol for five followed 
by " group of two vertical lines, _l£ is shown by two symbols 
for five, and so on. One plausible cause for che observed 
diffiCw"' ~ effect is that this symbol system is less 
familiar to blind students. A second probable contributing 
factor is that the versions presented to blind and sighted 
students were slightly different. Because the print symbol 
for five (i.e., four lines crossed by a diagonal) could not 
be represented easily as a raised line drawing within the 
braille text of the item, it was denoted by a group of five 
uncrossed braille symbols for the letter "1". To reflect 
this modification, the text of the item was changed from, 
"How many uncrossed tallies would be used in the 
representation of 29 in this system," to the somewhat more 
complex, "How many tallies not in sets of five would be used 
in the representation of 29 in thi3 system?" ( emphas is in 
original). The added linguistic complexity of this 
modification, along with the novelty of the tally sy3tem, 
are likely causes of the observed differential difficultly 
for blind students. 
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The other item reaching criterion (HO) is" unexpectedly 
easy for VIB students. This item (see Figure 3c) requires 
the examinee to choose from among five options the set of 
travel directions that would produce the same result as a 
given sequence. The spatial skills required by this task 
uay be similar to those used by blind s tudents in memo rizing 
directions and in forming mental representations of 
frequently-used physical environments (e.g., paths, rooms, 
buildings). It is possible that blind individuals have 
developed such skills to a greater degree than sighted peers 
of equal math reasoning ability, thus accounting for their 
unexpectedly high performance on this item. 

Presented in Table 16 are results for the performance 
of hearing impaired-regular (HIR) and nonalsabled students 
(NHF) students on the nonreading clu3ter> Most effects for 
this group are positive, a result consistent with the 
finding that this item grouping was differentially easy for 
these examinees. On WSA3, thrc of eight items show 
significant main effects, with two of these three also 
evidencing interactions. ^on^ of the significant items 
approaches the .1 practical criterion. On WSA5 , seven of 13 
items are significant: five items show main effects, one 
both a main and interaction, and one an interact ion ef f ec t . 
None of the main e f f ec ts comes reasonably close to .1. For 
WSA3, the significant effects are associated with items 
appearing at the end of a section, a finding consistent with 
the hypothesis that this cluster was easier for hearing 
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imp aired s tudents because extended time per mi tted them to 
reach these i terns in greater proportions than their 
nonhandicapped peers • However , though some i terns at the 
close of the section on WSA5 also show significant effects, 
so do several other items placed earlier in the test, 
suggesting that something other than, or in addition to, 
timing is responsible for the differential performance of 
this group. 



Insert Table 16 about here 



Performance results for hearing impaired-regular (HIR) 
and nonhandicapped (NHF) students on the algebra comparisons 
cluster are given in Table 17. Again, as expected, most 
effects are positive. Three of six items show main effects 
on WSA3, with one also displaying an interaction. (Two of 
these three were noted as significant in the discussion of 
the nonreading cluster.) On WSA5 , two of five items (both 
of which also appear in the nonreading cluster), are 
significant ; one of these items also exhibi is an 
interaction. None of the two significant main effects comes 
reasonably close to the .1 practical criterion. Again, 
significant items appear at the end of test sections and in 
earlier locations, suggesting that extra time alone is not a 
sufficient explanation for differential operation. 
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Insert Table 17 about here 



Table 18 presents performance results for learning 
disabled-cassette (LDC) and nonhandi capped (NHF) students on 
this same subtest. Two of the six items on WS A3 and two of 
the five on WSA5 show effects, all of which are positive. 
In addition, one significant interaction appears on each 
form. The four items showing main effects are the same as 
those that showed positive effects for hearing impaired- 
regular examinees. Again, however, none of the effects 
approximates the .1 criterion and no consistent clustering 
at the end of test sections is apparent. 



Insert Table 18 about here 



In summary, the analysis of individual items composing 
errant clusters has produced several results. First, of the 
61 item performances studied, 34 were statistically 
significant: 22 performances exhibited only main effects, 
2ive showed both main effects and interactions, and seven 
only interaction effects. Of the main effects, only three 
were of a magnitude large enough to be considered 
practically meaningful. Two of these three items were 
differentially difficult and one differentially easy. The 
deviant operation of all three items was associated with 
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visually impaired students taking the braille version of the 
SAT. 

Second, the small number of unequivocally deviant icems 
discovered for visually impaired students taking braille 
tests suggests that graphics multiple choice and 
miscellaneous items are not generally inappropriate for this 
group; several items in these clusters appeared to operate 
equivalently for visually impaired and nondisabled students. 
Rather, selected items falling within these broad classes 
may be inappropriate because they appear to measure 
constructs other than mathematical reasoning, ability. Such 
items may present unfamiliar symbol systems (e.g., the tally 
item), add linguistic complexity as a result of modified 
translations, or require cognitive-spatial operations that 
are not easily performed by blind students and which are 
only tangentially related to mathematics reasoning (e.g, the 
graduated cylinder item) • 

Last, the analysis suggests that the .2 criterion used 
for cluster screening was relatively sensitive. Several 
clusters exceeding the criterion were found to be composed 
of items evidencing minimal effects (e.g., WSA3 Algebra 
Comparisons). Even for those clusters far exceeding the .2 
criterion, only a few isolated instances of differential 
item performance were detected. 

Summary and Recommendations 

This 8 tudy has investigated the psychomet ric 
characteristics of special administrations of the SAT for 
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nine handicapped groups. Data on these characteristics are 
central to evaluating the accuracy of scores for measuring 
the developed scholastic abilities of disabled examinees • 

With respect to level of performance , t^e rirsc 
characteristic described, handicapped groups varied widely . 
In " neral, visually-impaired students and those with 
physical handicaps achieved mean scores comparable to 
students taking the SAT in national administrations. In 
contrast, learning disabled and hearing impaired students 
performed more poorly than the general SAT-taking 
population, usually by at least <i half standard deviation. 
In addition, most groups showed d. rences between Verbal 
and Mathematical scores that were comparable to the 
reference population, with the exception of hearing 
impaired-regular students who performed relative^ better on 
Mathematical than .Verbal , and visually impaired-braille 
students , who showed no cons is tent superiority for the 
Mathemati al over the Verbal scale. 

In contrast to level of performance, the reliability of 
the SAT was found to be comparable to the reference 
population for all handicapped groups. This finding 
sugges ts that one potential source of unfairness > 
differences in measurement precision, probably need not be 
of practical concern . 

To ensure that the time extensions allowed in special 
adminis trat ions are enough to permit disabled students to 
complete the same proportion of the test as their 
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nonhandi capped peers , a third psychometric characteristic, 
test speededness, was examined* With respect to the 
reference sample, no disability group was found to be 
disadvantaged by lack of time, thus suggesting that another 
possible source of unfairness is probably of little import* 

The final psychometric characteristic studied was 
unexpected differential performance . Investigation of this 
characteristic was conducted to identify potentially biased 
item types , types that may no t measure the abili ty assessed 
by the overall test . Differential performance was evaluated 
through a two-stage procedure in which the operation of 
groups of items was first investigated* Five item 
groupings, or clusters, were identified by this procedure as 
potentially problematic . The individual i terns in these 
clusters were then subjected to a more rigorous analysis to 
discover whether these broad item classes, or only isolated 
items f were responsible for cluster effects* This analysis 
identified only three items, all for visually impaired 
students taking the braille version, that showed clear 
evidence of idiosyncratic operation* 

The localization of idiosyncratically operating items 
to visually impaired students taking the brai lie exam 
suggests that extra care be taken in the development and 
translation to braille of SAT forms used by the Admissions 
Testing Program's Services for Handicapped Students. In 
addition, the possibility should be considered of pilot 
testing b rallied exams before these tests are put into 
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actual service. The aim of such testing would be to detect 
any items tapping inappropriate skills , confusing 
instructions, errors in brailling, or other remaining 
irrelevant sources of difficulty. Pilot tes ting need not be 
carried out with large numbers of examinees • More desirable 
would be individual or small-group administrations in which 
examinees could discuss potential difficulties as they arise 
directly with test development staff. Finall", as an 
addi tional check on the success of the test development and 
brailling processes, periodic analyses of the operation of 
items on the braille exam should be considered. 

In contrast to visually impaired-braille examinees, ?o 
items showing practically important indications of 
differential performance were found for hearing impaired- 
regular or learning disabled-cassette students. In 
addition, the large majority of effects that were detected 
for these two groups were positive, suggesting no negative 
impact on total score. 

In cone lus ion , with the exception of performance level , 
the psychometric characteristics of the SAT forms studied 
appear to be largely comparable for the disabled and 
nonhandicapped groups taking part in this investigation. 
This result should extend to other forms of the SAT and 
other disabled students to the extent that these groups and 
forms, and the conditions under which they are administered, 
are similar to those employed in the study. That the 
psychometric characteristics of the test are similar across 
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populations provides some of the evidence necessary to 
support SAT scores as accurate and fair indicators of the 
developed scholastic abilities of disabled students. 
Further evidence from factor analyses and predictive 
validity studies should add knowledge about the meaning of 
these scores for handicapped examinees. 
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Table la 

Numbers of Students Taking Each Type 
of <!AT Special Administration for WSA3 

a 

Group 



Mul- Un- 



Type 


VI 


PH 


HI 


LD 


tiple 


known 


•Brtillc 


98 


0 


1 


2 


0 


1 


Large- 
type 


486 


30 


6 


185 


18 


1 


Cassette 


27 


2 


1 


107 


3 


0 


Regular 


223 


°46 


28 7 


2 *83 


27 


23 


Cassette & 
large type 


29 


4 


0 


2 3 


4 


1 


Braille & 
cassette 


5 


1 


0 


0 


0 


0 


Cassette & 
regular 


16 


1 


1 


192 


1 


0 


Unknown 


9 


6 


1 


60 


1 


1 


Total 


893 


390 


297 


3552 


54 


27 



102. 

726 
140 
3889 

61 

6 

211 
78 



a 

VI ■ visually impaired, FH = physically handicapped, HI ■ 
hearing impaired, LD ■ learning disabled. 
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Table lb 

Numbers of Students Taking Each Type 
of SAT Special Administration for WSA5 



a 

Group 



Exam 
lype 


UT 
V ± 


pu 

r n, 


U T 


t n 


Mul- 


Un- 
known 


local 


Braille 


105 


i 


0 


i 


0 


0 


107 


Large- 
nype 


498 


16 


5 


136 


15 


6 


676 


Cassette 


11 


0 


0 


113 


2 


0 


126 


Regular 


175 


230 


150 


2316 


29 


24 


2924 


Cassette & 
large type 


27 


0 


0 


25 


1 


0 


53 


Braille & 
cassette 


21 


1 


0 


1 


0 


0 


23 


Cassette & 
regular 


12 


1 


0 


253 


4 


1 


271 


Unknown 


9 


5 


4 


38 


0 


0 


56 


Total 


858 


254 


159 


2883 


51 


31 


4236 



a 

VI ■ visually impaired, PH = physically handicapped, HI - 
hearing impaired, LD « learning disabled. 
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Table 2 

Sample Sizes and Acronyms Used 
to Denote Disability Groups 



Acronym 
HIR 
LDC 
LDCR 

LDL 
LDR 



PHR 



VIB 



VIL 



VIR 



Disability Group 



Hearing impaired students 
taking regular-type edition 

Learning disabled students 
taking cassette edition 

Learning disabled students 
taking cassatte and regular- 
type editions 

Learning disabled students 
taking large-type edition 

Learning disabled students 
taking regular-type edition 

Physically handicapped students 
taking regular-type edition 

Visually impaired students 
taking braille edition 

Visually impaired students 
taking large-type edition 

Visually impaired students 
taking regular-type edition 



WS A3 WSA5 
Sample Sample 
Size Size 



287 



107 



192 



185 



2983 



34b 



98 



486 



223 



150 
113 
253 

136 
2316 
230 
105 
498 
175 
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Table 3 

Performance of Nonhandi capped Students on WSA3 and WSA5 
Relative to Students Taking the SAT from 3/80 to 6/83 



WSA3 



Group 

Seniors taking form 

Mean 
SD 



Verbal 



448 
(108) 



Mathematical 



493 
(116) 



Seniors taking SAT 
from 3/80-6/83 

Mean 
SD 



413 
( 104) 



454 
(112) 



WSA5 



G roup 

Juniors, taking form 

Mean 
SD 



Vf rbal 



424 
(107) 



Mathematical 



459 
(113) 



Juniors taking SAT 
from 3/80-6/83 

Mean 
SD 



442 
( 103) 



489 
(112) 



a 

Calculated from statistics presented in College Board Admissions 
Testing Program Statistical Summaries (Cook, Petersen, & Ervin, 
1980; Cook, Petersen, & Jacob, 1981; Cook, Petersen, & Flesher, 
1982; Cook, Petersen, & Zicha, 1983; Cook, Petersen, & Dorans, 
1984) . 
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Table 4 

The SAT Performance of Nine Disability Groups 



WS A3 



Verbal Scaled Scores 



WSA5 



Group 


Ms 3 n 


on 
D U 


Group 


Mean 


ou 


Li n ri 


*t £ *t 




\7 T U 
V IK 


H JO 




PHR 


423 


112 


VIB 


434 


134 


VIB 


412 


127 


VIL 


433 


111 


VIR 


401 


101 


PHR 


432 


107 


VIL 


400 


110 


NHA 


424 


106 


LDR 


370 


97 


LDR 


376 


96 


LDCR 


351 


81 


LDL 


366 


;6 


LDC 


349 


86 


LDCR 


350 


85 


LDL 


349 


91 


LDC 


328 


82 


HIR 


284 


91 


HIR 


326 


103 






Mathematical 


Scaled Scores 








WS A3 






WSA5 




Group 


Mean 


SD 


Group 


Mean 


SD 


NHA 


468 


114 


VIR 


491 


133 


VIR 


456 


135 


NHA 


468 


114 


PHR 


434 


131 


VIL 


468 


128 


VIL 


431 


129 


PHR 


460 


116 


LDR 


411 


121 


VIB 


438 


133 


LDCR 


j78 


98 


LDR 


412 


111 


VIB 


376 


113 


HIR 


407 


111 


LDL 


374 


ICS 


LDL 


391 


95 


HIR 


373 


116 


LDCR 


374 


93 


LDC 


365 


101 


LDC 


360 


86 



NHA denotes all students taking forms of the SAT administered 
between 3/80 and 6/83» Scores for this group calculated from 
statistics presented in College Board Admissions Testing Program 
Statistical Summaries (Cook, Petersen, & Ervin, 1980; Cook, 
Petersen, & Jacob, 1981; Cook, Petersen, & Flesher, 1982; Cook, 
Petersen, & Zicha, 1983; Cook, Petersen, & Dorans, 1984). 
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Tatle 5 



Disabled Student SAT Performance in 

i 

SD Units from the Nonhandicapped Student Mean 



Verbal 



Group 

PHR 

VIB 

VIR 

VIL 

LDR 

LDL 

LnCR 

LDC 

HIR 



WSA5 
Differ- 
ence^ 

0.08 

0.09 

0.11 

0 .08 
■0.45 
■0.55 
■0 .70 
■0.91 
■0 .92 



WS A3 
Differ- 
enc e 

■0.01 
■0.11 
•0.22 
■0.23 
■0.51 
■0.7 1 
■0.69 
•0.7 1 
■1.32- 



Weighted 
Average 

0.02 
-0.01 
■0.07 
■0.07 
■0.48 
■0.64 
■0.69 
■0.81 
■1.18 



Mathematical 



Group 

VIR 

VIL 

PHR 

LDR 

VIB 

HIP 

T iDL 

LDCR 

LDC 



WSA5 
Dif f er- 
enc e 

0 .20 

0.00 
■0.07 
•0.49 
■0.26 
■0.54 
■0 .68 
-0.82 
■0.95 



WS A3 
Dif f- 
enc e 

-0.11 
-0.3 2 
-0.30 
-0.50 
-0 .81 
-0.83 
-0.82 
-0.79 
-0.90 



Weighted 
Average 

G .03 
-0.16 
■0.21 
■0.50 
■0.53 
■0.73 
■0.74 
-0.81 
■0.93 



a 

Nondisabled stadents are all examinees taking the SAT from 3/80 
to 6/83. Differences are expressed in SD units of the 
nonhandicapped group. 



9 

ERJC 



61 



-53- 
Table 6 



Difference Between SAT Verbal and Mathematical Scores 

a 

for Handicapped Students 



Group 

HIR 

VIR 

LDR 

LDCR 

VIL 

LDL 

LDC 

PHR 

VIB 



WSA3 

Difference 
Index 

2.14 
1.15 
0.94 
0.75 
0.65 
0.64 
0.43 
0 .23 
-0.7 5 



WSA5 

Difference 
Index 

1 .89 
1.15 
0.87 
0.67 
0.73 
0.65 
0.95 
0.63 
0.07 



a 

Difference index is the ratio of the difference between Verbal 
and Mathematical mean scaled scores for a handicapped group 
divided by the pooled standard deviation for those scores to the 
same quant, "y calculated for all students taking the SAT betwee* 
3/80 and 6/83. A difference index of +1 indicates intra-test 
performance equivalent in magnitude and direction to the 
reference group. 
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Table 7 



Differences in SD Units Between Scaled Score Means of 

a 

Disabled Student Groups taking WS A3 and WSA5 



Verbal Math 



Group Difference Difference 

HIR .44 *** .30 ** 

VIR .34 *** .26 *** 

VIL .30 *** .29 *** 

LDL .18 .17 

VIB .17 .50 * 

PHR .08 .21 * 

LDR .06 * .01 

LDCR -.01 -.04 

LDC -.25 -.05 



* p < .05 
** p < .01 
*** p < .001 



a 

Differences are calculated by subtracting the WS A3 mean from the 
WSA5 mean for ?. handicapped group and dividing by the pooled 
"erbal or Mathematical standard deviation for that group. 
Significance of differences was tested using the two-tailed t- 
test . 
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Table 8 

SAT Reliability for Disability Groups 



Verbal Section 



b 

Alpha Reliability SE Measurement 

a 



Group 


WSA3 


WSA5 


WSA3 


WSA5 


VIB 


.93 


.95 


3.87 


3 


.74 


NHF 


.92 


.92 


3.73 


3 


.75 


VIL 


.91 


.92 


3.82 


3 


.74 


PHR 


.91 


.91 


3.83 


3 


.75 


VIR 


.90 


.9 1 


3.79 


3 


.77 


LDR 


.89 


.90 


3 .80 


3 


.77 


HIR 


.88 


.91 


3.81 


3 


.79 


LDL 


.87 


.89 


3.84 


3 


.80 


LDC 


.86 


.84 


3.76 


3 


.74 


LDCR 


.85 


.86 


3 .82 


3 


.81 



Mathematical Section 



b 

Alpha Reliability SE Measurement 

a 



Group 


WSA3 


WSA5 


WSA3 


WSA5 


VIR 


.94 


.94 


3.11 


3.07 


VIB 


.93 


.94 


3 .08 


3 .07 


VIL 


.93 


.93 


3.15 


3.11 


PHR 


.93 


.92 


3.13 


3.17 


LDR 


.93 


.92 


3.11 


3.14 


HIR 


.92 


.9 1 


3.15 


3.16 


NHF 


.92 


.9 1 


3 .09 


3.15 


LDL 


.91 


.89 


3.11 


3.14 


LDCR 


.91 


.89 


3.11 


3.13 


LDC 


.90 


.86 


3 .08 


3.12 



a 

NHF ■ nonhandicapped students taking WS A3 or WSA5 . 

b 

Standard errors of measurement are in raw score units. 
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Table 9 

SAT Speededness for Disability Groups Compared with 
Nonhandicapped Students Taking the Same Test Form 



Verb- 



Section I 



Group 

LDC 

PHR 

VIR 

VIL 

LDR 

L'uL 

LDCR 

VIB 

HIR 



Ratio of Percent 
Completing Section 



WSA2 

1.23 
1 .23 
1 .22 
1 .22 
1.19 
1 .19 
1.16 
1.15 
1.14 



WSA5 

1 .34 
1 .29 
1 .34 
1 .34 
1.32 
1 .32 
1.26 
1 .26 
1 .33 



Ratio of Percent 
Completing 75% of 
Section 



WS A3 

1 .00 
1 .00 
1 .00 
1 .00 
1 .CO 
1 .00 
1 .00 
1 .00 
1 .00 



WSA5 

1 .01 
1 .01 
1.01 
1 .01 
1 .01 
1 .01 
1 .01 
1 .01 
1 .01 



Verbal Section II 



Group 

LDC 

PHR 

VIR 

VIL 

LDR 

LDL 

LDCR 

VIB 

HIR 



Ratio of Percent 
Completing Section 



WS A3 

1 .09 
1 .23 
1 .04 
1 . 19 
.98 
1 .00 
1 .05 
1 .44 
1 .25 



WSA5 

1 .23 
1 .08 
1 .12 
1 .09 
.95 
1 .03 
1 .03 
1 .14 
1.21 



Ratio of Percent 
Completing 75% of 
Section 



WSA3 

1 .03 
1 .03 
1 .03 
1 .03 
1 .03 
1 .03 
1 .03 
1 .02 
1 .03 



WSA5 

1 .03 
1 .03 
1 .03 
1 .03 
1 .03 
1.03 
1 .02 
1.03 
1 .02 



Ratio is the percentage of disabled students divided by the 
equivalent value for nondisabled students. Values above 1.00 
indicate a higher percentage of disabled than nondisabled 
students completing the section or part section. 
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Table 9 (con't) 
SAT Speededness for Disability Groups Compared with 
Nonhandi capped Students Taking the Same Test Form 



Mathemat ical Section I 

a 

a Ratio of Percent 

Ratio of Percent Complet ing 75% of 

Completing Section Section 



Group 


WSA3 


WSA5 


WS A3 


WSA5 


VIL 


1 .00 


1.13 


1 ,00 


1 .08 


HIR 


.99 


1 .20 


.99 


1 .08 


LDC 


.97 


.96 


.99 


1 .03 


VIR 


.96 


1 .23 


1 .00 


1 .07 


LDR 


.91 


1.01 


.99 


1 .04 


LDCR 


.91 


.99 


.98 


1 .01 


PHR 


.89 


1.19 


1 .00 


1 .05 


LDL 


.88 


.91 


1 .00 


1 .07 


VIB 


.88 


1 .00 


.99 


1 .03 






Mathematical Section 


II 










Ratio of 


Percei 




Ratio 


of Percent 


Completing 75% 




Completing Section 


Section 




Group 


WS A3 


WSA5 


WS A3 


WSA5 


VIL 


1 .-,2 


1 .22 


1 .02 


1 .03 


HIR 


1 .88 


1 .20 


1.02 


1 .02 


LDC 


.67 


1 .04 


1 .02 


.94 


VIR 


1 .94 


1.20 


1 .02 


1 .04 


LDR 


1 .73 


1.16 


1 .02 


1 .01 


LDCR 


1 .69 


1.09 


1.02 


.98 


PHR 


1 .84 


1.21 


1 .02 


1 .02 


LDL 


1.71 


1.13 


1 .02 


.99 


VIB 


1 .76 


1.17 


1.02 


1 .00 



a 

Ratio is the percentage of disabled students divided by the 
equivalent value for ncndisabled students. Values above 1.00 
indicate a higher percentage of disabled than nondisabled 
students completing the section or part section. 
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Table 10 ' 



Extent of Unexpected Differential Performance 

a 

in SD Units on SAT Verbal Item Clusters 



Antonyms 



Ana logies 





( n ■ 


25) 


(n 


= 20) 


Group 


WS A3 


WSA5 


WS A3 


WSA5 


VIR 


.00 


.00 


.00 


.00 


VIL 


.01 


.01 


-.03 


-.04 


VIB 


.05 


.06 


-.05 


-.17 


PHR 


.10 


.01 


-.04 


-.06 


LDR 


-.04 


-.06 


-.01 


-.08 


LDL 


.02 


.04 


.02 


-.13 


LDCR 


-.01 


-.02 


-.12 


-.02 


LDC 


-.02 


-.08 


.00 


-.07 


HIil 


-.07 


-.04 


.10 


.05 




Sentence 


Re 


ctding 




Completion 


Comprehens ion 




(n = 


15) 


(n 


- 25) 


Croup 


WS A3 


WSA5 


WS A3 


WSA5 


VIP 


.00 


.00 


.00 


.00 


VIL 


.05 


.07 


.05 


-.01 


VIB 


-.05 


.07 


.12 


-.03 


PHR 


.05 


.06 


.04 


-.02 


LDR 


.02 


-.01 


-.05 


-.01 


LDL 


.05 


.04 


-.15 


.01 


LDCR 


.11 


.03 


-.01 


-.01 


LDC 


.20 


.06 


-.10 


.11 


HIR 


-.17 


-.16 


.11 


.10 



Tabled values represent the difference between the actual and 
predicted mean cluster raw scores for each handicapped group 
divided by that group's cluster standard deviation. Positive 
values indicate better performance than expected while negative 
values denote the converse. An absolute value in excess of .2 on 
both forms is considered practically important. 
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Table 11 



Extent of Unexpected Differential Performance 



a 



in SD Units 


on SAT 


Mathemat ical 


Graphics-Load Clusters 




Text 


Multi- 


Text 






pie 


Choice 


Comparisons 




WS A3 


WSA5 


WS A3 


WSA5 


Group 


(n=30) 


(n-32) 


(n=14) 


(n-13) 


VIR 


.00 


.00 


.00 


.00 


VIL 


.06 


.02 


-.03 


.02 


VIB 


.07 


.06 


.00 


.05 


PHR 


.03 


.01 


.00 


-.02 


LDR 


-.05 


- .07 


-.07 


-.05 


LDL 


-.04 


-.05 


-.07 


-.08 


LDCR 


-.10 


-.12 


- .08 


-.06 


LDC 


-.08 


-.11 


-.13 


-.03 


HIR 


-.01 


-.05 


.07 


-.07 




Graphics 


Graphic s 




Multiple Choice 


Comparis ons 




WS A3 


WSA5 


WS A3 


WSA5 


Group 


(n=10) 


(n-8) 


(n-5) 


(n»7) 


VIR 


,00 


.00 


.00 


.00 


VIL 


.02 


-.06 


.06 


-.07 


VIB 


-.31 


-.46 


-.17 


-.49 


PHR 


-.02 


-.15 


.11 


.04 


LDR 


.05 


.01 


.02 


.03 


LDL 


-.02 


-.02 


-.03 


-.05 


LDCR 


.04 


.13 


.01 


.05 


T ,DC 


.07 


.14 


.15 


-.05 


HIR 


.18 


-.01 


-.03 


.25 



a 

Tabled values represent the difference between the actual and 
predicted 'mean cluster raw scores for each handicapped group 
divided by th*t group 1 s cluste~ ' .ndard deviation. Positive 
values indicate better perform whan expected while negative 

values denote the converse. An aboolute value in excess of .2 on 
both forms is considered practically important. 



68 



-60- 



Table 12 



Extent of Unexpected Differential Performance 



a 



in SD 


Units on SAT Math ;mat ic? 1 


I tem-Content 


Clusters 




Arithmetic 


Algebra 




Multiple 


Choice 


Multiple 


Choice 




MS A3 


WSA5 


WSA3 


WSA5 


Group 


(n-11) 


( n= 1 2 ) 


(n=12) 


(a-1 1) 


VIR 


-.C" 7 


-.05 


-.13 


- .08 


VIL 


.01 


-.02 


-.01 


.02 


VIB 


.15 


,08 


.04 


.04 


PHR 


-.04 


-.03 


.01 


-.01 


LDR 


-.15 


-.14 


-.10 


-.14 


LDL 


-.2 1 


-.i0 


-.04 


-.14 


LDCR 


-.18 


-.14 


-.14 


- .22 


LDC 


-.22 


-.18 


-.07 


-.24 


HIR 


-.14 


-.03 


-.06 


-.08 




Geomet ry 


Miscellaneous 




Multiple 


Choice 


Multiple 


Choice 




WSA3 


WSA5 


WSA3 


WSA5 


Group 


(n-il) 


('.-=11 ) 


(n-6) 


(n=6) 


VIR 


-.01 


.09 


- ,10 


-.02 


VIL 


-.01 


-.02 


-.17 


-.02 


VIB 


-.23 


-.17 


-.65 


-.20 


PHR 


-.07 


-.12 


-.22 


.04 


LDR 


.04 


.02 


-.23 


,01 


LDL 


.05 


.00 


-.36 


-.01 


LDCR 


.06 


.12 


-.35 


-.08 


LDC 


.08 


.17 


-.29 


.03 


HIR 


.18 


.02 


-.18 


-.i4 



a 

Tabled values represent the difference between the actual and 
predicted mean cluster raw scores for each handicapped group 
divided by that group's cluscer standard deviation. Positive 
values indicate better performance than expected while negative 
values denote the converse. An absolute value in excess of .2 on 
both forms is considered practically important • 
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Table 12 (ccn't) 



Extent of Unexpected Differential Performance 
in SD Units on SAT Mathematical I tern-Content Clusters 



Arithmetic 
Comparisons 



Algebra 
Compar i sons 





WSA3 


WSA5 


WSA3 


WSA5 


Gro *p 


(n-6) 


(n-7) 


(n-6) 


(n-5) 


VIR 


.20 


.11 


.20 


.11 


VIL 


.26 


.0 , 


.16 


.21 


VIB 


.33 


.03 


.23 


.15 


PHR 


.28 


.02 


.22 


.18 


LDR 


.14 


.03 


.20 


.10 


LDL 


.17 


.05 


.30 


.18 


LDCR 


.12 


.01 


.21 


.13 


LDC 


.12 


-.06 


.27 


.29 


HI& 


.25 


-.06 


.26 


.26 



Geometry 
Comparisons 





WS A3 


WS*5 


Group 


(n-6) 


(n-6) 


VIR 


.09 


-.04 


VIL 


.10 


-.06 


VIB 


-.09 


-.27 


PHR 


.11 


.03 


LDR 


.03 


.00 


LDL 


-.13 


-.11 


LDCR 


.03 


.09 


LD^ 


.03 


-.04 


HIR 


.03 


.16 



Tabled values represent the difference betveen the actual and 
predicted mean cluster raw scores for each handicapped group 
divided by that group's cluster standard deviation* Positive 
values indi^te better performance than expected while negative 
values denote the converse. An absolute value in excess of .2 on 
boch forms is considered practically important. 



ERIC 



70 



-62- 



Table 13 



Extent of Unexpected Differential Performance 

a 

in SD Units on SAT Mathematical Reading Load Clusters 



Minimal 

Nonr eading Reading 





WS A3 


US AS 


WS A3 


WSA5 


Group 


(n-8) 


(n=*13) 


(n-22) 


(n=»13) 


VIR 


.17 


.03 


- .05 


.07 


VIL 


. 17 


.06 


- .02 


.05 


VIB 


.23 


.04 


-.18 


- .20 


PHR 


.19 


.06 


-.06 


.07 


LDR 


.15 


-.0.1 


- . 12 


.06 


IDL 


.'."/ 


.02 


-.23 


-.10 


LDCR 


.15 


-.07 


-.21 


.05 


LDC 


.11 


.02 


-.24 


-.05 


HIR 


.36 


.20 


.01 


.23 




Ra 


ad ing 








WS A3 


WSA5 






Group 


(n=29) 


(n=34) 






VIR 


.01 


-,01 






VIL 


.06 


-.01 






VIB 


.00 


.00 






PHR 


.06 


-.04 . 






LDR 


.01 


-.09 






LDL 


.05 


-.06 






LDCR 


.02 


-.07 






LDC 


12 


-.08 






HIR 


-.03 


-.18 







a 

Tabled lues represent the difference between tha actual and 
predj .'ted . ^an cluster raw scores for each handicapped group 
divided b; thet group 1 s cluster standard deviation. Positive 
values indicate better performance than expected while negative 
values denote t* c converse. An absolute value in excess of .2 on 
both forms is considered practically important. 
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Table 14 

Performance of Visually Impaired-Braille (VIB) and 
Nonhandicapped Students (NHF) on Graphics Multiple Choice Items 



Item 

I 3 
4 
17 
18 
22 
24 
II 5 
6 
7 
12 



VIB 

Probability 
of Passing 

.12 
.03 
.02 
.03 
.01 
.03 
.08 
.11 
.06 
.01 



WSA3 
NHF 

Probability 
of Passing 

.16 
.10 
.04 
.02 
.01 
.01 
.05 
.45 
.14 
.01 



Differ- 
ence 

-.04 

-.07 *** 
-.02 

.01 

.00 

.01 

.03 * 
-.34= *** 
-.08 ** 

.00 



Inter- 
action 



** 



x ** 



WSA5 



I ta rn 

I 3 
7 
14 
20 
24 

II 9 
12 
13 



VIB 

Probability 
of Passing 

.06 
.08 
.01 
.03 
.02 
.01 
.03 
.01 



NHF 

Probability 
of Passin g 

.14 
.11 
.01 
.10 
.02 
.02 
.04 
.01 



Differ- 
ence 

-.08 ** 
-.03 

,00 
..07 *** 

.00 
-.r 1 
-.01 

.00 



Inter- 
action 



x *** 



* p < .05 
** p < .01 
*** p < .001 

a 

Performance data are for nonhandicapped students talcing forms 
WSA3 and WSA5. 

b 

Differences may not reflect the computed difference between the 
handicapped and nonhandicapped columns due t> rounding error. 
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Table 15 

Performance of Visually Impaired-Braille (VIB) and 
Nonhandicapped Students (NHF) on 

a 

Miscellaneous Multiple Choice Items 



WS A3 



I tern 

I 3 

7 
20 
21 
23 



VIB 

Probability 
of Passing 

.12 
.11 

.06 
c03 
.01 
.04 



NHF 

Probability 
of Passing 

.16 
.45 
.14 
.06 
.04 
.02 



Differ- 
ence 

-.04 

..34 *** 

-.08 ** 

-.03 ** 

..03 *** 

.02 ** 



Inter- 
action 



x ** 
x * 



WSA5 

VIB NHF b 

Probability Probability Differ- Inter- 

Item of Passing of Passing ence action 



I 3 .06 .14 -.08 ** 

10 .35 .23 .12 * 

15 .04 .03 .01 

23 .01 .02 .00 

25 .01 .02 -.01 

I T . 5 .08 .15 -.07 ** 



* p < .05 
** p < .01 
*** p < .001 

a 

Performance data are for nonhandicapped students taking forms 
WSA3 and WSA5. 

b 

Differences may not reflect the computed difference between the 
handicapped and nonhandicapped columns due to rounding error. 
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Table 16 

Performance of Hearing Impaired-Regular (HIR) and 

a 

Nonhandicapped Students (NHF) on Nonreading Items 



WS A3 





HIR 


NHF 


b 






Probability 


Probability 


Differ- 


Inter- 


It eta 


of Passing 


of Passing 


ence 


ac t ion 


1116 


.09 


.10 


-.01 




18 


.93 


.95 


-.02 




20 


.05 


.04 


.01 




22 


.97 


.96 


.01 




25 


.08 


.06 


.02 




30 


.07 


,02 


.05 *** 


x ** 


32 


.02 


.01 


.01 *** 




34 


.07 


.02 


• .05 *** 


x *** 






WSA5 








HIR 


NHF 


b 






Probability 


Probability 


Differ- 


Inter- 


Item 


of Passing 


of Passing 


ence 


act ion 


I 2 


.17. 


.13 


-.01 




17 


.06 


.05 


.01 




18 


.14 


.10 


.05 * 




21 


.03 


.01 


.02 *** 




1118 


.05 


.08 


-.03 * 




21 


.04 


.05 


-.01 


x *** 


23 


.08 


.08 


.01 




24 


.13 


.07 


.06 *** 




26 


.00 


.01 


.00 




28 


.06 


.07 


-.01 




31 


.11 


.07 


.05 ** 




32 


.10 


.04 


.U6 *** 


x * 


35 


.01 


.01 


.00 





* p < .05 
** p < .01 
*** p < .001 



a 

Performance data are for nonhandicapped students taking forms 
WS A3 and WSA5 . 
b 

Differences may not reflect the computed difference between the 
handicapped and nonhandicapped columns due to rounding error. 
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Table 17 

Performance of Hearing Impaired-Regular (HIR) and 

a 

Nonhandicapped Students (NHF) on Algebra Comparisons Items 



WS A3 



I tern 

I 18 
21 
22 
25 
32 
34 



HIR 

Probability 
of Passing 

.93 
.04 
.97 
.08 
.02 
.07 



NHF 

Probability 
of Passing 

.95 
.08 
.96 
.06 
.01 
.02 



Differ- 
ence 

-.02 

..03 *** 
,01 
.02 

.01 *** 
.05 *** 



Inter- 
action 



*** 



USA5 



I tern 

1123 
24 
26 
32 
35 



HIR 

Probability 
of Passin g 

.08 
.13 
.00 
.10 
.01 



NHF 

Probability 
of Passing 

.08 
.07 
.01 
.04 
.01 



Dif f er- 
ence 

.01 

.06 *** 
.00 

.06 *** 
.00 



Inter- 
action 



* p < .05 
** p < .01 
*** p < .001 

a 

Performance data are for nonhandicapped students taking forms 
WSA3 and WSA5. 

b 

Differences may not reflect the computed difference between the 
handicapped and nonhandicapped columns due to rounding error. 
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Table 18 

Performanca of Learning Disabled-Cassette (LDC) and 

a 

Nonhandicapped Students (NHF) on Algebra Comparisons Items 



WSA3 

LDC NHF b 

Probability Probability Differ- Inter- 

Item of Passing of Passing ence action 



II1& .95 .95 .00 

21 .10 .08 .02 

22 .95 .96 -.01 
25 .07 .06 .01 

32 .04 .01 .03 *** 

34 .04 .02 .02 ** 



WSA5 



Item 



LDC 

Probability 
of P. -.sing 



KHF 

Probability 
of Passing 



Differ- 
ence 



Inter- 
action 



1123 
24 
26 
32 
35 



.06 
.10 
.01 
.10 
.01 



.08 
.07 
.01 
.04 
.01 



.01 

.04 ** 
.00 

.06 *** 

.00 



* p < .05 
** p < .01 
*** p < .001 



a 

Performance data are for nonhandicapped students taking forms 
WS A3 and WSA5. 

b 

Differences may not reflect the computed difference between the 
handicapped and nonhandicapped columns due to rounding error. 
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Figure 1 
SAT Verbal Item Types 



Analogies 



Antonyms 



Each question below consists 01 a related pair of words 
or phrases, followed by five lettered pairs of words or 
phrases- Select the lettered pair that best expresses a 
relationship similar to that expressed in the original pair. 
Example: 



YAWN: BOREDOM:: (A) dream:sleep 
(B) anger: madness (Q smile .amusement 
(D) face: expression (E) impatience : rebellion 
0 0 # (J) o 



36. COW:BARN:: (A) pig:mud (B) chicken :coop 
(C) camel: water (D) cat: tree 
(E) horse: racetrack 



f?^ 1 ^ ^l 0 * ™ li^t, of a word in capital letters 
followed by five lettered words or phrwes. Chocse the 
word or phrase that is most nearly opposite in waning 
to the word m capital letters. SmceiomT^? the ques- 
tions requfre you to distinguish fine shades of meaning, 
consider all the choices before deciding which is best 
Example: . 



GOOD: (A) sour (B) bad 
(D) hot (E) ugly 



(C) red 

<D ♦ <E> <D <D 



1. VERSATILE: (A) unadaptable (B) mediocre 
(C) impatient (D) egocentric (E) vicious 

2. FRAUDULENT: (A) . .her pleasing 
(B) extremely beneficial (C) courteous 

(D) authentic (E) simplified 



Sentenre 
Comp7 



Q BEST COPY AVAILABL ' 

ERIC 



Each sentence below has one or two blanks, each blank 
indicating that something has been omitted. Beneath 
the sentence are five lettered words or sets of words. 
Choose the word or set of words that best fits the 
meaning of the sentence as a whole. 
Example: 



Although its publicity has been — , the film itself 
u intelligent, well-acted, handsomely produced, 
and altogether — . 

(A) tasteless. .respectable, (B) extensive. .moderate 
(C) sophisticated, amateur (D) risqul^rude 
(E) perfect, jpectacular _ ^ _ _ _ 

QJ (53 qJ) QJ 



16. He dlinieo that the document wa — because it 
merely listed <snd*nff *td species and did not specify 
penalties fc r harming them. 

(A) indispensable (B) inadequate (C) punitive 
(D) aggressive (E) essential 
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Figure I (cont'd) 
SAT Verbal Item Types 



Reading Comprehension 



Each passage below is followed by questions based on its content. 
** at ** stated or iropfad in that passage. 



Answer all questions following a passage on the basis of 



Mars revolves around the Sun in 687 Earth days, 
which is equivalent to 23 Earth months. The axis of 
Mars's rotation is tipped at a 25° angle from the plane of 
its orbit, nearly the same as the Earth's tilt of about 23° 
Because the tilt causes the seasons, we know that Mars 
goes through a year with four seasons just as the Earth 
does. 

Frcm the Earth, we have long watched the effect of 
the seasons on Mars. In the Martian winter, in a given 
hemisphere, there is a polar ice cap. As the Martian spring 
comes to the Northern Hemisphere, for example, the 
north polar cap shrinks and material in .the planet's more 
temperate zones darkens. The surface of Mars is always 
mainly reddish, with darker gray areas that, from the 
Earth, appear blue green. In the spring, the darker regions 
spread. Haifa Martian year later, the same process hap- 
pens in the Southern Hemisphere. 

One possible explanation for these changes is bio* 
logical: Martian vegetation could be blooming or spread- 
ing in the spring. There are other explanations, however. 
The theory that presently seems most reasonable is that 
. each year during the Northern Hemisphere springtime, a 
dust storm starts, with winds that reach velocities as high 
as hundteds of kilometers per hour. Fine, light-colored 
dust is blown from slopes, exposing dark areas underneath. 
If the dust were composed of certain kinds of materials, 
such as limonite, the reddish color would be explained. 

29. Jt can ! t inferred that one characteristic of limonite 
is its 

(A) reudish color 
(8) blue green color 

(C) ability to change colors 

(D) ability to support rich vegetation 

(E) tendency to concentrate into a hard surface 



30. According to the author, seasonal variations on 
Mars are a direct result of the 

(A) proximity of the planr t to the Sun 

(B) proximity of the planet to the Earth 

• (C) presence of ice caps at the poles of the planet 

(D) tilt of the planet's rotational axis 

(E) length of time required by the planet to revolve 

around the Sun 

31. It can be inferred that, as spring arrives in the 
Southern Hemisphere of Mars, which of the 
following is also occurring? 

(A) The northern polar cap is increasing in size. 

(B) The axis of rotation is tipping at a greater angle. 

(C) A dust storm is ending in the Southern 

Hemisphere. 

(D) The material in the northern temperate zones 

is darkening. 

(E) Vegetation in the southern temperate zones is 

decaying. 



Source: College Board (1983). Taking the SAT. New York: Author 
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Figure 2 
SAT Mathematical Item Types 



Multiple Choice 

In this section solve each problem, using any available space on the page for scratchwork. Then decide which is the best 
of the choices given and blacken the corresponding space on the answer sheet. 

The following information is for your reference in solving some of the problems. 

Circle of radius r: ' Area = xr 2 : Circumference *=' 2xr 



The number of degrees of arc in a circle is 360. 
The measure in degrees of a straight angle is 180. 
Definitions of symbols: 
— li equal to £ 
* is unequal to £ 
< is less than || 
> is greater than .1 



is less than or equal to 
is greater than or equal to 
is parallel to 
is perpendicular to 




Triangle: The sum of the measures in 
degrees of the angles of a 
triangle is 180. 
If LCDA is a right angle, then 

(1) area of LABC « — * CD 

(2) AC 2 = AD 2 + DC 2 



Note: Figures which accompany problems in this test are intended to provide information useful in solving the problems 

■ l tVm accuratc,y 35 P 0551 ^ EXCEPT when.it is stated in a specific problem that its figure is not drawn to 
scale. All figures lie in a plane unless otherwise indicated. All numbers used are real numbers. 



If - + t ~ 2, then x = 
5 

(A) 0 (B) 1 (C) 2 (D) 3 (E) 4 

A triangle with sides of lengths 4, 8 t and 9 has the 
same perimeter as an equilrteral triangle with side 
of length 

(A) 5y (B) 6 (C) 6± (D) 7 (E) 7 j 



ERLC 



Quantitative Comparison 

Questions 8*27 - each consist of two quantities, one in Column A and one in Column B. You are to 
compare the two quantities and u the answer sheet blacken space 

A if the quantity in Column A is greater; 

B if the quantity in Column B is greater; 

C if the two quantities are equal; 

D if the relationship cannot be determined from the information given. 

Notes: 1. In certrtn questions, information concerning one or both of the quantities to be compared is centered 
above the two columns. 

2. In a given question, a symbol that pppears in both columns represents the same thing in Column A as it 
does in Column B. 

3. Letters such as x, n, and k stand fo; real numbers. 



El. 


EXAMPLES 
Column A Column B 

2X6 2+6 


1 

1 Answers 

>•••• 


Column A 
8. 0 


Column B 
OX 2 






-l 

I 
l 
I 


9. a + 25 


tf-5 • 


E2. 


180 - jc ' y 


| CD CO • <2> 






E3. 


p.q q.p 


1 

|® <D <D • 


Sources College Board (1983). 
\ JEakimt > the> SAT. New York: Author 
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Figure 3 

Items Showing Main Effects Exceeding 
the . 1 Criterion 



(a) WSA3 



II 6,-..In the figure above, where t x || l lt x * 

(A) 20 (B) 30 (C) 45 (D) 50 (E) 60 



(b) WSA3 I 6. In a certain tally system, 12 is represented by 

JHT JW » and 15 is represented by JW m Wt. How 
many uncrossed tallies would be used in the 
representation of 29 In this system? 

(A) None (B) One (C) Two 
(D) Three (E) Four 



(c) WSA5 N— travel 1 mile north 

E— travel 2 miles east 
S— travel 3 miles south 
W— travel 4 miles west 

I 10. If N, E, S, and W ore defined as shown above and 
if a combination of the letters means to perform 
the instructions in the order given, which of the 
following yields the same re ^ult as NWS ? 

(A) W (B) E (C) SEN (D) EWN (E) WSJJ 



Copyright <C) Educational Testing Service, 197 1 *, 1978. Used by permission. 
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